Scaling and entropy in p-median facility location along a line 
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The p-median problem is a common model for optimal facility location. The task is to place p 
facilities (e.g., warehouses or schools) in a heterogeneously populated space such that the average 
distance from a person's home to the nearest facility is minimized. Here we study the special 
case where the population lives along a line (e.g., a road or a river). If facilities are optimally 
placed, the length of the line segment served by a facility is inversely proportional to the square 
root of the population density. This scaling law is derived analytically and confirmed for concrete 
numerical examples of three US Interstate highways and the Mississippi River. If facility locations 
are permitted to deviate from the optimum, the number of possible solutions increases dramatically. 
Using Monte Carlo simulations, we compute how scaling is affected by an increase in the average 
distance to the nearest facility. We find that the scaling exponents change and are most sensitive 
near the optimum facility distribution. 
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Quantitative studies in many branches of science fre- 
quently reveal scaling laws where two sets of observables 
are related by a power law over several orders of mag- 
nitude. Examples range from astronomy (e.g., Kepler's 
third law) to biology where, for example, Klciber's law 
states that the metabolic rates of mammals scale approx- 
imately as the three-quarter power of their body mass [l[ . 
Here we look at a problem from economic geography, the 
relationship between the spatial distribution of a popula- 
tion and the distribution of service establishments (e.g., 
post offices or gas stations). 

Physicists typically enjoy the luxury of measuring scal- 
ing exponents in carefully designed and repeatable exper- 
iments. In biology and the social sciences, by contrast, 
the exact circumstances of an experiment are generally 
more difficult to control and to repeat. As a consequence, 
power-law exponents are frequently obfuscated by noise 
in the measurement and in the process generating the 
scaling law itself. The remaining uncertainty can lead to 
heated debates if, for example, the scaling exponent in 
Kleiber's law is not truly 2/3 instead of 3/4 @,[|. The 
available geographic data for the distribution of service 
establishments leave similar room for interpretation so 
that various scaling laws have been proposed 0-0] ■ 

Facing such controversies, theorists often try to cal- 
culate the "correct" exponent from deterministic mod- 
els. One recurring idea is that scaling should emerge 
naturally from some appropriate model if an objective 
function (en ergy dissipation 0, earnings @, travel dis- 
tance 0, E3I , etc.) is optimized. This approach has 
led to elegant theories, but it leaves one key problem un- 
addrcssed. Knowing that evolutionary biology, human 
decisions, or other processes shaping the available empir- 
ical data are intrinsically stochastic, there is in principle 
a huge variety of outcomes. How many different solutions 
are conceivable? How close to optimal does the observed 
solution need to be in order to exhibit the theoretically 
predicted scaling exponent? 

Here we study a model which serves as an example of 



computational techniques suited to address these ques- 
tions. The model is the p-median problem of optimal 
facility location along a strongly heterogeneously popu- 
lated line (e.g., a transcontinental highway). The task is 
to place p facilities along the line and find the configu- 
ration that minimizes an objective function, in this case 
the average distance to the nearest facility [12| ■ Ignoring 
small-scale heterogeneity in the population, an analytic 
calculation predicts a simple scaling law for the length 
of the line segments served by different facilities. The 
exact optimum locations can be computed numerically 
for realistic input data and are in good agreement with 
the analytic prediction. Using techniques from statisti- 
cal physics, we calculate the number of possible facility 
locations for non-minimal costs. With Monte Carlo sim- 
ulations we will then quantify how deviations from the 
optimum make it less likely to find the theoretical expo- 
nent. 



I. THE p-MEDIAN PROBLEM 

The challenge in facility location problems is to place 
p service centers or facilities so that n demand points 
are optimally served (sec for example Ref. [l3] for an 
overview). Facilities can be hospitals, supermarkets, fire 
stations, libraries, warehouses, or any other supply cen- 
ters providing vital resources to the population living at 
the demand points (e.g., households or cities). Here we 
consider the case where the demand points arc at regular 
intervals along a one-dimensional geographic object, such 
as a road or a river, and where every demand point is a 
possible location for a facility. The number of people N 
who require the facilities' services is assumed to be known 
at each demand point. This number is typically very het- 
erogeneous across geographic space. Depending on the 
context, there are different strategies for the placement 
of the facilities. In this article, we concentrate on the 
p-mcdian problem, an important special case, where the 
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FIG. 1: Illustrative example of the p-median problem along 
a line. The population N is known at the demand points 
qi, . . . ,q„. In this article, the distance between neighboring 
demand points is assumed to be constant. Facilities will be 
placed on p of these n demand points. (In the figure, p = 4.) 
Their locations ri,. . . ,r p are to be determined so that the 
average distance between a demand point and the nearest 
facility, weighted by N, is minimized. After the facilities have 
been located, the line can be divided into p segments si , . . . , s p 
so that the i-th segment corresponds to the service region of 
the i-th facility. 



objective is to minimize the average distance between a 
person's demand point and the nearest facility. (A recent 
summary of the vast literature on the p-median problem 
can be found in Ref. [Til ). 

Let us call the facility locations from left to right 
r\, . . . , r p . These positions are chosen among the demand 
points qi, . . . , q n , which are equidistant (i.e., qi+\ — qi~ 
const, for i = 1, ... ,n — 1) along a line (see Fig.[T]). If the 
population at qi is denoted by Ni, the p- median problem 
consists of minimizing the cost function (23j 







E"=i N i min i=lj 



(i) 



Because only trips to the nearest facility play a role in 
Eq. [TJ the line along which the demand points are located 
can be partitioned into p segments or service regions. 
Demand points belong to the same segment if and only 
if they share the same closest facility, sec Fig. Q] The 
length of facility i's service region is given by 



|0"i + r 2 ) - qi if i = 1, 
\{r i+ x-ri) ifi = 2, .. 

\{r p -\ +r p ) -q n ifi=p. 



,P~ 1, 



(2) 



We will now take a closer look at the relation between Sj 
and the population density around facility i. 



II. SCALING OF THE LENGTHS OF THE 
SERVICE REGIONS 

At first sight, it is plausible that the spatial density of 
facilities should follow the same trend as the population 
density: where there are more people there should be 
proportionately more facilities. However, as we will see 
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FIG. 2: (a) Under the assumption that the population N 
(gray histogram) varies little between neighboring demand 
points, N can be approximated by a continuous function p 
(black curve), (b) The function o{x) is defined as the length 
s of the segment covering position x. Strictly speaking, a is 
a piecewise constant function. However, if the spatial varia- 
tions in N are sufficiently small, a can be approximated by a 
continuous function (indicated by the dotted curve) . 



shortly, the p-median solution does not follow this rule 
that would give every facility an equal number of cus- 
tomers. Instead facilities are less abundant per capita in 
the high-demand regions than in the low-demand regions. 

For a spatially heterogeneous population distribution 
Ni, it is difficult to deduce this general trend directly 
from Eq. [TJ With certain approximations, however, the 
problem becomes analytically tractable; essentially, we 
translate the line of reasoning developed in Ref. [lfl and 
[ill for the two-dimensional p-median problem to the one- 
dimensional case. First we define the population density 
p(x) which is the number of people per unit length in the 
vicinity of x. Equation [T] can be rewritten as 



C(n, 



r p ) 



dx 



(3) 



where we have used the new notation to replace sums by 
integrals. If we allow p to be piecewise constant, this ex- 
pression is still exact, but later it will be more convenient 
to approximate p with a continuous function (Fig. [2^,). 

Next we define cr(x) to be the length of the segment 
serviced by the facility closest to x (see Fig. ^>). The 
average distance from facility j to a point x inside its 
service region is equal to gja(x), where gj depends on 
the exact location of the facility. For example, if rj is 
close to the center of the segment, gj ~ |. In the spirit 
of a mean-field approximation, we will now assume that 
p varies little over the size of a segment. Then we can 
replace the exact distance, min \ x — r,-|, in the numerator 
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of Eq. [3] with its average gja(x) 



C 



Jq" P{x) 9 o-QQ dx 
Iq" P(x) dx 



(4) 



The index j was dropped in Eq. |4] assuming that most 
facilities will be close to the center of their service region 
so that <?j is approximately constant. 

Unlike in Eq. [TJ the locations rj no longer appear ex- 
plicitly in Eq.|4] Instead we have to find the function a(x) 
that minimizes C subject to the constraint that there are 
p facilities. This constraint can be expressed as 



a(x) 



dx 



P- 



(5) 



Introducing a Lagrange multiplier a, the problem is 
equivalent to finding the zero of the functional deriva- 
tive 



9 fq™ P( x ) a ( x ) dx 



fa* p( x ) dx 



a p 



1 



a(x) 



dx 



= 0, 



solved by 



a(x) 



laf o q " p(x>)dx> 



9P( X ) 



(6) 



(7) 



The Lagrange multiplier can be eliminated by inserting 
this expression into Eq. [5] After some algebra, 



a(x) 



pVp( x ) 



-1/2 



(8) 



The lengths of the service regions are thus inversely 
proportional to the square root of the population density. 
The spatial density of facilities 1/er increases cx p 1 ^ 2 , but 
the per-capita density l/(pa) decreases oc p~ x l 2 with 
growing population. The square-root scaling is a com- 
promise providing most services where they are most 
needed, namely in the densely populated regions, but still 
leaving sufficient resources in sparsely populated regions 
where travel distances are longer. This result implies an 
economy of scales: In crowded cities fewer facilities per 
capita can supply a larger population than in rural ar- 
eas. If facilities and demand points are not restricted to 
be along a line, but can be placed in two-dimensional 
space, the scaling exponent is 2/3 instead of 1/2 fiol ITl| 
(see Section |VT| . However, economies of scale are also 
predicted in two dimensions. Empirical studies have in- 
deed reported this effect for certain classes of real facili- 
ties 



III. EXACT SOLUTION FOR EMPIRICAL 
POPULATION DISTRIBUTIONS 

The calculation in the previous section assumes that 
the population density p(x) varies little within a service 



region. As we can see from Eq. [8j this implies that the 
segment length <j{x) is also a smooth function (Fig. [5}}). 
Real census data, however, typically reveal strongly vary- 
ing populations even on small spatial scales. In Fig. [5^,- 
d, we show population numbers near three US Interstate 
highways and the navigable Mississippi River. The data 
were generated from the US census of the year 2000. 
First, Interstates 5, 10, 90 and the Mississippi River were 
parameterized by arc length and markers were placed at 
regular 1-km intervals. Then census blocks within 10 
km of the highways or the Mississippi were identified 
and their population assigned to the nearest kilometer 
marker. As is clear from Fig. [3jt-d, neither of the four 
populations is a smooth function. Whether the assump- 
tions behind Eq. [5] are valid, is questionable, but it turns 
out that the scaling law for the service regions still holds 
with surprising accuracy. 

To compute the scaling exponent, p = 100 facilities are 
placed on each of the four test data sets. The optimal 
locations are calculated with the efficient algorithm of 
Ref. [l2|. Their positions along the roads and the river 
in geographic space are shown in Fig. [3j3. The segment 
lengths Si are calculated for each facility i = 0, . . . ,p. In 
Fig. |4l Si is plotted versus the mean value of N inside the 
segment, denoted by (N)i Ordinary least-squares 

fits of 



log(sj) = a\og(N)i + const. 



(9) 



to the data yield slopes a = -0.511 (1-5), -0.514 (I- 
10), -0.504 (1-90), and -0.496 (Mississippi River), close 
to the prediction a = —1/2 of Eq. [8j The correlations 
are strong; R 2 is consistently bigger than 0.89. As- 
suming that the residuals are log-normally distributed, 
the predicted value —1/2 is in all cases within the 95% 
confidence intervals. Thus, the equivalent of Eq. [8j 
Si cx ((A^i) -1 / 2 , obtained by replacing the continuous 
variables p and a by their discrete counterparts (N)i and 
Si, is a good approximation. This observation demon- 
strates that scaling at the exact p-median configuration 
is robust even in the presence of strong spatial fluctua- 
tions. 



IV. THE NUMBER OF CONFIGURATIONS 
FOR NON-MINIMAL COSTS 

That the square-root scaling of the service regions is 
discernible even for realistically heterogeneous input, es- 
tablishes a potential link to previous empirical work. 
Data collected in Ref. 0-0 suggest, at least for certain 
classes of facilities, a sublinear dependence of service 
facilities on population numbers. It has been conjec- 
tured that the p-median model j| or a generalization 
thereof [!, 0] might explain this trend. Admittedly, we 
are looking in this article at a simplified linear geometry. 
Yet that sublinear scaling is robust even for substantially 
noisy input, might be viewed as supporting evidence for 
this conjecture. 
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FIG. 3: Population N as a function of position x along (a) 
Interstate 5, (b) 10, (c) 90, (d) the navigable part of the Mis- 
sissippi River. The small squares below the :r-axes indicate 
the optimal p-median positions of 100 facilities, (e) Map of 
the roads, the river, and the facility locations. 
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FIG. 4: (Color online) The length of a service region s versus 
the mean population (N) in this service region. Lines indicate 
least-squares fits to Eq. [9] Scaling is in good agreement with 
the analytic prediction s oc (A f )~ 1/ ' 2 . 



However, there is more to the problem than first meets 
the eye. Although it is mathematically convenient to as- 
sume that facilities are placed to minimize an objective 
function such as Eq. [TJ it is far from clear that the ex- 
act minimum will be achieved in reality. Decisions about 
facility locations are probably more haphazard in real 
life. For example, site selections may be swayed by polit- 
ical interests, short-term fluctuations in property prices, 
or based on an incomplete knowledge of the actual de- 
mand. Even if the best effort is made to reach the global 
optimum, "accidents of history" may keep the facility lo- 
cations trapped in a costlier local optimum. It seems 
overly optimistic to draw conclusions about the scaling 
of real service regions only from the best of all solutions. 
The available literature for real facility distributions 0- 
0] - rather than the numerically optimal ones discussed 
in Sec. HIT] - also justifies cautious skepticism, as some 
significant differences to the p-median result have been 
observed in reality, albeit in two dimensions. 

How many facility configurations with costs near, but 
not necessarily equal to, the global minimum exist? 
There is no simple way to answer this question. Although 
the algorithm of Ref. [ID can find the global optimum 
very efficiently, it does not provide information about 
non-optimal solutions. Scanning all possible configura- 
tions is out of the question because their number is too 
vast. Even for our smallest test data set (1-5) there are 
(Too 3 ) ~ 3-5 • 10 175 different ways to locate the facilities. 
The situation is reminiscent of many-particle systems in 
physics where one wishes to calculate the large number of 
micro-states at a certain energy level out of an even larger 
number of all conceivable micro-states. In that context, 
statistical mechanics has developed many powerful nu- 
merical tools. We will build on this analogy in order to 
estimate the number of non-optimal facility locations. 
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(a) 



4 6 
cost C (km) 

FIG. 5: (Color online) The entropy S (i.e., the logarithm of 
the density of states) versus the cost C. The inset shows the 
same four curves as the main panel, but with rescaled abscissa 
(C — C m in)/n where n is the number of demand points. 



Let us call il(C)dC the number of facility locations 
with costs between C and C + dC. The function f2(C) 
plays the role of the "density of states" in statistical 
mechanics. As we will see, f2 increases very rapidly 
as C exceeds the minimum Crnin -> 

so that it will be 

more convenient to work with its logarithm, the entropy 
S(C) = log 51(C). Our aim is to calculate S with Monte 
Carlo simulations. Several methods exist [l^l - [l7| : here 
we apply the Wang-Landau algorithm [T^|. First, the 
range of possible costs is divided into small discrete in- 
tervals of length AC. Then a random walk through the 
set of facility locations is performed and we count, in the 
form of a histogram, how often each interval is visited. 
The main idea behind the Wang-Landau algorithm is to 
bias the random walk in such a manner that all intervals 
arc visited equally often. For such a "fiat histogram" we 
obtain equally good statistics for all intervals, an advan- 
tage when S(C) is the basis of further calculations. We 
describe details of our implementation in App. [A] 

From calculations for four different empirical popula- 
tion distributions (Fig. [5]) it is clear that S is singular at 
C m in, the smallest possible cost. Thus, S increases enor- 
mously in the vicinity of C m in and the density of states 
il = cxp(S') grows even more rapidly. The results for four 
different empirical population distributions suggest that 
S follows approximately the same curve (inset of Fig. [5]) 
if regarded as a function of (C — C m in)/n, where n is the 
total number of demand points qi, . . . , q n . Therefore, it 
appears to be a universal feature that for all realistic 
populations a large number of different possible configu- 
rations must be considered if the assumption of optimal- 
ity is relaxed. This observation raises the question: Can 
the scaling relation of Eq. [8] still be observed if facility 
locations are not exactly optimal, but are among the nu- 
merous configurations achieving almost but not exactly 
C ? 
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FIG. 6: (Color online) (a) The mean scaling exponent (a), (b) 
the coefficient of determination R 2 as a function of the cost 
C. 



IS SCALING DETECTABLE FOR 
NON-MINIMAL COSTS? 



If we randomly select a facility configuration with a 
cost in the interval [C, C + dC) , we can formally obtain 
the scaling exponent a from Eq. [5] as follows. First, we 
log-transform the segment lengths Sj and the population 
density (N)i. Then a least-squares linear fit to Eq.[3]will 
be performed to calculate a. This procedure can be cou- 
pled with the Wang-Landau algorithm so that, at every 
step in the random walk through configuration space, we 
compute a, the cost C, and at the end of the algorithm 
the mean value (a) as a function of C. 

The results, shown in Fig. [5^, indicate that (a) is ap- 
proximately — 1/2 at the minimum cost C m i n for all four 
numerical test sets, as anticipated by our earlier calcula- 
tions. As the cost increases, a also increases, indicating 
a decreasing dependence of the segment lengths on the 
population. This behavior makes sense because the fa- 
cility locations become more random as we move away 
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from the optimum. Interestingly, the overall trend how 
(a) increases with C is similar in all four cases. In par- 
ticular, the behavior near the minimum is noteworthy 
because (a) increases most rapidly near C m i n . In other 
words, the analytic prediction at C m i n - which provides 
us with the only easily calculable reference point for a - 
is unfortunately at the point where small deviations can 
also cause the greatest changes in a. 

Together with the least-squares exponent a, we can 
also obtain other statistical measures from linear regres- 
sion, such as the coefficient of determination R 2 (Fig.^p). 
It can take values between and 1; the higher its value, 
the stronger the correlation between s and (N). In our 
numerical test sets, R 2 takes its highest value (~ 0.9) at 
C m i n and decreases as we move toward higher costs fol- 
lowing a slightly sigmoidal curve toward values around 
zero. At very large costs, R 2 increases again because 
the solution is effectively an "obnoxious facility" loca- 
tion where facilities are in sparsely populated regions so 
that s and (N) are positively (instead of negatively) cor- 
related. For costs near C m i ni however, an increase in (a) 
is coupled with a reduction in R 2 . 



VI. CONCLUSION 

In this article, we have studied the one-dimensional 
p-median problem. In one dimension, the exact opti- 
mum can be calculated numerically in polynomial time; 
in two dimensions [l9| and on arbitrary graphs [2(|, the 
p- median problem is NP-complcte so that no polynomial- 
time algorithm is currently known. Previous empirical 
studies of scaling in real facility locations have usually 
dealt with two-dimensional densities. The approximate 
analytic result in one dimension, a oc p~ x l 2 (Eq. [5]), can 
be easily generalized for arbitrary dimension d, where 
the size of a d-dimensional Voronoi cell a is predicted to 
scale as p~ d /( d+1 \ The scaling of the facility density 1/er 
with the population density p thus remains sublinear in 
all dimensions. Numerical optimization in two dimen- 
sions, based on US census data, yields indeed an expo- 
nent in excellent agreement with the predicted exponent 
a = -2/3 O. 

In 1977, Stephan implicitly proposed that the p- 
median model might explain empirical scaling relations 
between the area and population density of subnational 
administrative units (e.g., states, provinces, counties) Q. 
Although he later generalized the objective function as 
more data became available Q, the notion that facili- 
ties may self-organize towards sublinear scaling has re- 
mained attractive, as proved by the recent rediscovery of 
Stephan's model by Um et al. Q- 

However, as the work shown here underlines, one has 
to be careful when interpreting empirical data. Increased 
spatial noise in the facility distribution can lead to dif- 
ferent exponents and reduced correlations. The situa- 
tion investigated here portrays only one special scenario 
how randomness might be present, namely as a uni- 



form probability distribution over all costs in an interval 
[C, C + dC] . It is also conceivable that not all configu- 
rations within this range are equally likely, so that the 
best-fit exponents may behave differently. We may also 
replace the p-median model by a different optimization 
principle (e.g., c omp etitive facility location such as the 
Hotclling model [21() which can change the exponent at 
the optimum. However, we believe that a steep increase 
in the number of possible configurations is a generic ten- 
dency of most models that relax the constraint of strict 
optimization even to a small degree. 
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Appendix A: Wang-Landau algorithm to calculate 
the density of states 

The Wang-Landau algorithm [l8|, [22j is designed to 
calculate the density of states 0(C) for C in some in- 
terval [Ci , C2] . First the interval is divided into small 
sub-intervals of length AC. The key element of the 
Wang-Landau algorithm is to visit every sub-interval 
[C, C + AC] with a probability oc [0(C)] _1 . Initially, 
the density of states is of course unknown - this is why 
we need the algorithm in the first place - but we will 
recursively obtain better estimates for O as the calcula- 
tion proceeds. At the beginning we set 0(C) = 1 for 
all intervals [C, C + AC] . Simultaneously we maintain a 
histogram H(C), which counts how often a cost between 
C and C + AC is encountered during the course of a 
random walk. At the beginning H(C) = for all C. 

The random walk through the set of facility locations 
proceeds as follows. Starting from an arbitrary initial 
configuration r = (n, . . . , r p ), a new set of facility po- 
sitions r' = (r[, . . . ,r' p ) is generated with probability 
P(r — > r'). In addition, a uniform random number 
p G [0,1] is generated. If p < min(l, 0(C)/0(C')), the 
current value of 0(C) is multiplied by a constant factor 
/, H{C) is incremented by 1, and r' becomes the next 
step in the random walk. Otherwise, the move is rejected, 
and we increment 0(C) and H(C) instead of 0(C) and 
H(C'). Following Wang's and Landau's original paper, 
Ref. [13, we initially set / equal to the Euler number 
e = 2.71828... When the histogram H[C) is sufficiently 
"flat", / is replaced by its square root (i.e., / <- \/J). 
For practical purposes, the histogram is treated as flat if 
the maximum number of visits recorded by H(C) is less 
than 10% more than the minimum. If this condition is 
satisfied, all H{C) are reset to 0, and the procedure is 
iterated until / < exp(10 -5 ). 

From an intermediate set of facility locations r, we gen- 
erate the new set r' by shifting one random facility one 
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step to the left or to the right with equal probability. Ex- 
ceptions are made if the facility is already at one of the 
edges of the line or adjacent to another facility. Let us 
define v to be the number of facilities on the edges (gi 



and q n ) plus twice the number of facility pairs occupy- 
ing neighboring demand points. Then the non-zero step 
probabilities are given by 



P[(n,r 2 , . ..,r P ) (r% - l,r 2 , ■ ■ -,r p )] = l/(2p - v) if n ^ qi, 



P[(n, 



.,r i _ 1 ,r i ,r i+1 ,. 
■ ,ri-i,ri,r i+ i,. 



■ ,r p )^> (n, ...,ri 



l,r. 



i+l; 



■ ,r p ) -> (n, . . .,n-i,ri + l,r i+1 , 
P[(ri,...,r p -i,r p ) (n,.. .,r p - X ,r p + 1)] = l/(2p-v) if r. p ^ q. 



l/{2p-v) if n-l^r^ui 

i/{2p~v) if n + i^n+i, i 



(Al) 

2,...,p, (A2) 
l,...,p-l, 
(A3) 

(A4) 



This set of moves is ergodic and satisfies detailed balance. 

In principle, we are able to explore all costs between 
the globally minimal C m m and maximal C max . In prac- 
tice, we have to reduce the search interval. On one hand, 
Cmax is orders of magnitude larger than C m in and we 
are interested only in VL near C rn i n - On the other hand, 
n increases so quickly that, close to C m i„, the random 
walk is extremely unlikely to propose a step decreasing 
the cost. Therefore, we confine the random walk to in- 
tervals [Ci,C2] which become smaller as C\ approaches 



C m in- We interpolate between all estimates of fl, which 
all differ from the real f2 by a multiplicative constant, 
with a straightforward least-squares algorithm to obtain 
a single curve for the entropy S ~ \og(tt) over all mea- 
sured values of C. There is exactly one constant left to 
be fixed because the Wang-Landau algorithm can calcu- 
late the entropy only up to an additive constant. Wc 
adopt the normalization that S — at the extrapolated 
maximum. 
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